Project title: “Customer Behavioural Analytics in the Retail Sector”
Names of Team Members:
1. Nadiia Honcharenko (220681, nadiia.honcharenko@st.ovgu.de)
2. Rutuja Shivraj Pawar (220051, rutuja.pawar@ovgu.de)
3. Shivani Jadhav (223856, shivani.jadhav@st.ovgu.de)
4. Sumit Kundu (217453, sumit.kundu@st.ovgu.de)
Under the Guidance of: M.Sc. Uli Niemann
Date: November 15, 2018
Background and Motivation: A customer is a key-centric factor for any business to be successful. Conventional wisdom tells us that the cost of retaining an existing customer is far less than acquiring a new one. In order that a business has a sustainable growth, the retention of its old customer base and expansion of the new customer base is very critical. This demands from a business to understand the behaviour of its customers in relation to the business. Therefore obtaining a 360° view of its customers is crucial for a business looking for a competitive edge in the market. In such a scenario, Customer Behavioural Analytics plays an important role in leveraging data analytics to find meaningful behavioural patterns in the customer-specific business data.
Consequently, this project aims to understand the consumer behaviour in the retail sector. Decoding the consumer behaviour will be based on understanding how consumers make purchase decisions and what factors influence those decisions. This project also aims to discover existence of dependencies between customers, products and shops to highlight further insights about their behaviour. These meaningful insights will further help a business to implement stratergies leading to an increased revenue through customer satisfaction.
Project objectives: This project aims to address the problem of understanding behaviour of customers of an Italian retail distribution company Coop in a single Italian city. The project intends to discover different analytical insights about the purchase behaviour of the customers through answering the below formulated Research Questions (RQ),
1. Are customers willing to travel long distances to purchase products in spite of the high average product price in a shop?
Relevance: This will help to understand whether the price is an important factor affecting the majority of customers purchase decisions.
2. Which are the products for which customers are ready to travel long distances and for which products they select the closest shop?
Relevance: This will help to understand the nature of the products in the context of proximity. It is assumed that customers will select closest shops to buy daily products like grocery but may travel long distances to buy one-time-purchase products like kitchen equipment. As Data Science is results-driven and not based solely on intuition, this question can help to verify this assumption.
3. What is the maximum likelihood of a customer to select a particular shop to purchase a particular product?
Relevance: This will help to understand that which shops in the retail chain are in demand for a particular product. This can further facilitate better stock management to meet customer demands.
4. What is the ranking of the shops in terms of attracting more customers and thus generating more revenue and what is their individual customer base?
Relevance: This will help to understand the most popular shops in the retail chain and target different shop-level marketing schemes to the appropriate customers through customer segmentation.
5. Which are the customers that are most profitable in terms of revenue generation?
Relevance: This will help to understand the customers with potential high Customer Lifetime Value and target appropriate loyalty programs to generate satisfied loyal customers as advocates for the business.
Name of the dataset to be used: Supermarket aggr.Customer1
The dataset to be used is the retail market data of one of the largest Italian retail distribution company called Coop for a single Italian city (Pennacchioli et al. 2013).
The Supermarket aggr.Customer dataset used for the analysis contains data aggregated from the original datasets2 (Pennacchioli et al. 2013) and mapped to new columns. The dataset thus contains 40 features with 60,366 instances and is approximately 14.0 MB in size.
Design overview: Below are some of the algorithms and methods that we plan to utilize in our project,
1. Support Vector Machine (SVM)
We will approach RQ1 as a classification task and hence utilize SVM to classify whether a customer is willing to travel long distances or not to purchase products given the high average product price in a shop.
2. k-means Clustering
RQ2, RQ4 and RQ5 require to segment products, customers and shops into multiple clusters. We plan to utilize k-means clustering to partition the data into clusters and draw analysis from it.
3. Naive Bayes
RQ3 also requires the calculation of maximum likelihood estimation involving customer, products and shop. Hence we plan to create a model based on Naive Bayes for the estimation.
4. Apriori algorithm
RQ3 also requires an association to be drawn between customers and shops for different products. Hence we plan to use Apriori algorithm to determine the different association patterns in the data.
| Week | Responsibilites and Workload Distribution |
|---|---|
| 19.11. |
Nadiia & Rutuja: Data Cleaning Shivani: Data Transformation Sumit: Data Reduction |
| 26.11. |
Nadiia, Rutuja, Shivani, Sumit: EDA Data Modeling with each team member working on different modeling techniques |
| 03.12. |
Nadiia, Rutuja, Shivani, Sumit: EDA Data Visualization with each team member working on different visualization techniques |
| 10.12. | Nadiia, Rutuja, Shivani, Sumit: Website development with each team member working on different webpage creation |
| 17.12. | Nadiia, Rutuja, Shivani, Sumit: Answering different Research Question per member through different machine learning models |
| 24.12. | Nadiia, Rutuja, Shivani, Sumit: Answering different Research Question per member through different machine learning models |
| 31.12 |
Nadiia and Shivani: Website Integration with the Project data Rutuja and Sumit: Project Screencast creation |
| 07.01. | Nadiia, Rutuja, Shivani, Sumit: Final Project Detailing and Polishing |
| 14.01 |
Nadiia, Rutuja, Shivani, Sumit: Project Wrap-up |
GitHub Repository: https://github.com/Rspawar/Data-Science-with-R.git
References:
Pennacchioli, Diego, Michele Coscia, Salvatore Rinzivillo, Dino Pedreschi, and Fosca Giannotti. 2013. “Explaining the Product Range Effect in Purchase Data.” In Big Data, 2013 Ieee International Conference on, 648–56. IEEE.